TODO: Refine title
Initial Questions
TODO: Must have at least two questions. It is best to have different types of problems, ie one regression, and one classification
Objective
TODO: Analysis: Identify the questions, what is the objective/goal of processing this dataset? What answers are you interested to find through this dataset.
TODO: Determine the details about the dataset (eg. title, year, the purpose of dataset, dimension content, structure, summary) by exploring the raw data.
TODO: Short introduction with objective of the project.
Data Cleaning and Preprocessing
TODO: Which section of the data do you need to tidy?
TODO: Prepare data for analysis by correcting the variables and contents of the data.
TODO: Putting it all together as a new cleaned/processed dataset: For this task, you are also encouraged to explore any cleaning packages in R other than those learned in the course (diplyr, tidyr, lubridate, etc).
# # if (!require('dplyr')) install.packages('dplyr'); library('dplyr')
# # if (!require('tidyr')) install.packages('tidyr'); library('tidyr')
if (!require('lubridate'))
install.packages('lubridate');
if (!require('tidyquant'))
install.packages('tidyquant', repos='https://cran.asia/');
if (!require('plotly'))
install.packages('plotly', repos='https://cran.asia/');
library('lubridate')
library('plotly')
library('tidyquant')Data Ingestion
# covid_malaysia_endpoint <- "https://raw.githubusercontent.com/MoH-Malaysia/covid19-public/main/epidemic/cases_malaysia.csv"
covid_malaysia_endpoint <- 'cases_malaysia.csv'
covid_malaysia_df <- read.csv(covid_malaysia_endpoint, header=TRUE)
covid_malaysia_df$date <- as.Date(covid_malaysia_df$date, format="%Y-%m-%d")
str(covid_malaysia_df)## 'data.frame': 708 obs. of 31 variables:
## $ date : Date, format: "2020-01-25" "2020-01-26" ...
## $ cases_new : int 4 0 0 0 3 1 0 0 0 0 ...
## $ cases_import : int 4 0 0 0 3 1 0 0 0 0 ...
## $ cases_recovered : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_active : int 4 4 4 4 7 8 8 8 8 8 ...
## $ cases_cluster : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_unvax : int 4 0 0 0 3 1 0 0 0 0 ...
## $ cases_pvax : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_fvax : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_boost : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_child : int 0 0 0 0 1 0 0 0 0 0 ...
## $ cases_adolescent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_adult : int 1 0 0 0 2 1 0 0 0 0 ...
## $ cases_elderly : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_0_4 : int 0 0 0 0 1 0 0 0 0 0 ...
## $ cases_5_11 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_12_17 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_18_29 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_30_39 : int 0 0 0 0 1 0 0 0 0 0 ...
## $ cases_40_49 : int 1 0 0 0 0 1 0 0 0 0 ...
## $ cases_50_59 : int 0 0 0 0 1 0 0 0 0 0 ...
## $ cases_60_69 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_70_79 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cases_80 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cluster_import : int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_religious : int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_community : int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_highRisk : int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_education : int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_detentionCentre: int NA NA NA NA NA NA NA NA NA NA ...
## $ cluster_workplace : int NA NA NA NA NA NA NA NA NA NA ...
dim(covid_malaysia_df)## [1] 708 31
Exploratory Data Analysis
TODO: Results may include visualization, prediction, evaluation of models and discussion of output
A brief Look on the graph
fig <- plot_ly(covid_malaysia_df, type = 'scatter', mode = 'lines')%>%
add_trace(x = ~date, y = ~cases_new, name = 'Daily New Cvoid Cases')%>%
layout(showlegend = F)
options(warn = -1)
fig <- fig %>%
layout(
xaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff'),
yaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff'),
plot_bgcolor='#e5ecf6', width = 1200)
figMachine Learning
TODO: Results may include visualization, prediction, evaluation of models and discussion of output
Conclusion
TODO: Conclusion
Presentation and Submission
TODO Report: Submission will be an R markdown published at Rpubs, and the link is to be submitted in spectrum. The R markdown may include the following:
- Short introduction with objective of the project.
- Explanation of all the processes involved in the project
- Results may include visualization, prediction, evaluation of models and discussion of output
- Conclusion
TODO: Only one member per group will submit the report.
TODO: Each group is required to prepare a 10 minute presentation with powerpoint.
TODO: Both group members must present their parts.